Plasma extrachromosomal circular DNA as a potential diagnostic biomarker for nodular thyroid disease

The accurate diagnosis of cancerous nodules in nodular thyroid disease remains a significant challenge. 1,2 Extra-chromosomal circular DNA (eccDNA), generated during apoptosis, shows tissue/plasma-specific patterns, and varying spectra in different diseases. However, its role in distinguishing benign and malignant thyroid nodules is unexplored. 3–9 This study leverages Circle-seq technology and machine learning to investigate the potential of eccDNA as a non-invasive biomarker for diagnosing thyroid cancer

The plasma eccDNA analysis in the PTC group showed longer fragments and higher Guanine-Cytosine (GC) content, with most eccDNA populations under 1000 bases, peaking at around 202 and 338 bases, a preference for high GC content areas in eccDNA formation (Figure S3A-C).Genomic annotation highlighted eccDNA's enrichment in untranslated and exonic regions, aligning with known profiles in health and disease (Figure 1F).To understand the distribution of eccDNA genomic locations in PTC patients, we aligned eccDNA sequences with the human genome and identified 450 gene-containing eccDNA (eccGenes) that were significantly prevalent in the PTC group (Figure S4A).Gene ontology (GO) and gene set enrichment analysis (GSEA) indicated that these eccGenes frequently involve exons related to tissue growth, and the Wnt and GPCR signalling pathways (Figure S4B,C).Principal component analysis (PCA) highlighted distinct eccDNA gene region diversity associated with PTC (Figure 1G).Remarkably, an increased presence of miR-1203-related eccDNA circles was detected in the PTC group (Figure 2A,B).Transfection of synthesised miR-1203 eccDNA into thyroid cell lines (TPC-1, BHP10-3 and K1) led to significant transcriptional changes: 572 genes were upregulated and 1035 downregulated (Figure 2C).This significant shift in gene expression, involving numerous cancer-associated genes, underscores eccDNA's influence in oncogenesis (Figure 2D,E).
To investigate if PTC tumour cells emit eccDNA into the bloodstream, we crafted mouse xenograft models with human PTC cell lines TPC-1, BHP10-3 and K1.We isolated eccDNA from plasma, confirming human-origin eccDNA in mice (Figure 3A).Sequencing revealed genomic features consistent with tumour-derived eccDNA, including GC content, motif patterns and chromosomal distribution, highlighting its potential as a cancer detection marker (Figure 3B-D).
Exploring eccDNA's diagnostic value in PTC, we analysed 308 837 genomic locations.Utilising an E-net logistic regression model and a two-nested leave-one-out strategy, 10 we assessed the link between these locations and disease status.We identified 71 critical eccDNA locations capable of differentiating PTC from NOD patients, forming a potential classification model.By comparing 71 eccDNA locations identified in this study with the gene annotations of the human genome GRCh38.p13 in Ensembl (version 108), we found 18 locations showed no overlap with any known genes, while the remaining 53 locations overlapped with 71 known genes (Figure 4A and Table S2).
After completing the two-nested leave-one-out crossvalidation (LOOCV) process, we generated 72 predictions for the 72 subjects.Receiver operating characteristic curve (ROC) analysis on these predictions and their true labels produced an area under the curve (AUC) of .754(Figure 4B).To further evaluate the diagnostic efficacy of our model, we have supplemented it with more detailed performance metrics, including an accuracy of 68.1%, sensitivity of 55.3%, specificity of 92.0%, positive predictive value of 92.9% and negative predictive value of 52.3%.We also selected the optimal cut-off point using a twonested LOOCV approach.To address possible bias from random training and validation set splits, we conducted 2000 rounds of five-fold cross validation, averaging an AUC of .789with a standard deviation of .135.Additionally, a T-test comparing predictions across the two groups indicated a statistically significant difference (T = −3.9012,p = .00029;Figure 4C), suggesting the 71 identified eccDNA locations could serve as a diagnostic biomarker for distinguishing between PTC and NOD patients.
This study explores the potential of plasma eccDNA as a non-invasive biomarker to distinguish between benign and malignant thyroid nodules, potentially improving the diagnosis of PTC.The elevated eccDNA levels found in PTC patients and mouse PTC xenograft models indicate its tumour-originated release into circulation, supporting its diagnostic value.In this study, the sample size is an impor-tant limitation, which restricts the exploration of research conclusions and the performance of the model to a certain extent.A small sample size may lead to overfitting and reduce the credibility of research conclusions.To address this, we adopted a two-nested LOOCV loops to ensure that the testing set in the outer loop does not involve repeated training.The results from the outer loop were used as evaluation criteria, maximising data utilisation and preventing overfitting.After our posterior sample size estimation, our sample size meets the minimum sample size requirement, indicating that the research conclusions are credible.In the future, we will expand the sample size to improve our research model and conclusions.
Despite limited samples, we identified the nucleosomal origin and genetic predisposition for eccDNA formation in PTC, establishing eccDNA as a promising biomarker for cancer detection and enhancing thyroid cancer diagnostics precision.

S U P P O R T I N G I N F O R M AT I O N
Additional supporting information can be found online in the Supporting Information section at the end of this article.

F I G U R E 1
Distinct plasma extrachromosomal circular DNA (eccDNA) profiles in papillary thyroid carcinoma (PTC) patients.(A) Experimental workflow for identifying eccDNA from plasma and tumour tissues.DNA extraction and enrichment from tissue and plasma samples, followed by removal of linear DNA using exonuclease and amplification of eccDNA through rolling circle amplification.Machine learning analysis was conducted to pinpoint eccDNA locations capable of differentiating PTC and nodular thyroid goitre (NOD) patients.(B-G) Genomic and sequence characteristics of plasma eccDNA.(B-E) The eccDNA counts per million mapped reads (EPM) value, number of eccDNA mapped to protein-coding genes, percent of eccDNA mapped to protein-coding genes and percent of reads aligned to repeats in each clinical group.(F) Genomic distribution of eccDNA in each clinical group.(G) Principal component analysis (PCA) of differential eccGenes in each clinical group.p values are shown on the graph.

F
I G U R E 2 miR-1203-related circles promote the progression of papillary thyroid carcinoma (PTC).(A) Heatmap showing extrachromosomal circular DNA (eccDNA) containing full-length miRNA genes abundance between the PTC and nodular thyroid goitre (NOD) groups.(B) Validation of miRNA1203-related eccDNAs by agarose gel electrophoresis.(C) Synthesis of miRNA1203-eccDNAs by the Ligase-Assisted Mini-circle Accumulation (LAMA) method.(D, E) Gene ontology (GO), KEGG and gene set enrichment analysis (GSEA) pathway analysis of miRNA1203-eccDNA biological functions.

F I G U R E 3
Characteristics of human extrachromosomal circular DNA (eccDNA) in tissue and plasma of human papillary thyroid carcinoma (PTC) cell xenografts mice.(A) Summary of the number of eccDNA of mouse or human origins detected in xenograft mouse plasma.(B) GC content distribution of eccDNA and their up-and down-stream regions with equal length in tissue and plasma.(C, D) Motif patterns of eccDNA junction sites and chromosomes distribution of PTC cell line xenograft mouse tissue and plasma.F I G U R E 4 Evaluation of the critical locations of extrachromosomal circular DNA (eccDNA) for papillary thyroid carcinoma (PTC) diagnosing.(A) Summary of the 71 crucial locations for classification model capable of distinguishing the PTC and nodular thyroid goitre (NOD) patients.(B) Area under the curve (AUC) value by Receiver Operating Characteristic (ROC) analysis using the 72 predictions and their corresponding true labels.(C) Boxplot of the 72 two-nested predicted scores calculated by the 71 locations for PTC and NOD patients.p values are shown on the graph.